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Abstract 

A common situation in filtering where classical Kalman filtering does not perform par- 
ticularly well is tracking in the presence of propagating outliers. This calls for robustness 
understood in a distributional sense, i.e.; we enlarge the distribution assumptions made in 
the ideal model by suitable neighborhoods. Based on optimality results for distributional- 
robust Kalman filtering from Ruckdeschel (2001, 2010b), we propose new robust recursive 
filters and smoothers designed for this purpose as well as specialized versions for non- 
propagating outliers. We apply these procedures in the context of a GPS problem arising 
in the car industry. To better understand these filters, we study their behavior at stylized 
outlier patterns (for which they are not designed) and compare them to other approaches 
for the tracking problem. Finally, in a simulation study we discuss efficiency of our proce- 
dures in comparison to competitors. 

Keywords: 93E11,62F35 
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1 Introduction 



Motivation State space models (SSMs) build a flexible but still manageable class of dynamic 
models, and together with corresponding attractive procedures as the Kalman filter and its 
extensions they provide an immensely useful tool for a wide range of applications. In this 
paper we are focussing on an application in engineering in the context of a GPS problem 
arising in the car industry with different linear and non-linear state space formulations. 

It is common knowledge for long that most of the classical procedures, used in this do- 
main suffer from lack of robustness, in particular the Kalman filters and smoothers which we 
concentrate on. 

The mere notion of robustness however in filtering context is not canonic, with the general 
idea to describe stability of the procedure w.r.t. variations of the "input parameters". The 
choice of the "input parameters" to look at varies from notion to notion passing from initial 
values to bounds on user-defined controls to distributional assumptions. As general in robust 
statistics, we are concerned with distributional or outlier robustness; i.e.; our input parameters 
are the model distributions and defining suitable neighborhoods about the ideal model we allow 
for deviations in the respective assumptions which capture various types of outliers. 

The amount of literature on robustifications of these procedures is huge, so we do not 
attempt to give a comprehensive account here. Instead, we refer to the surveys given in Ershov 
and Liptser (1978), Kassam and Poor (1985), Stockinger and Dutter (1987), Schick and Mitter 
(1994), Kiinsch (2001), and to some extent Ruckdeschel (2001, Sect. 1.5). In Section 3.2 below, 
we list references related more closely to our actual approach. 

Problem statement In our context these outliers may be system- endogenous (i.e., propa- 
gating) or -exogenous (i.e. non-propagating), which induces the somewhat conflicting goals of 
tracking and attenuation. 

If we head for robust optimality in the sense of minimizing mean squared error on neighbor- 
hoods, the standard outlier models from Fox (1972) pose problems barely tractable in closed 
form, although some approximate solutions have been given, e.g., in Masreliez and Martin 
(1977). A way out is given by outliers of substitutive type where closed-form saddle-points 
could be derived for the attenuation problem in filtering in Ruckdeschel (2001) which involve 
the hard-to-compute ideal conditional expectation. If however this conditional expectation is 
linear, we come up with an easy and fast robustification (rLS) which also is available in multi- 
variate settings. Although exact linearity can be disproved, approximate linearity usually holds, 
at least in a central region of the distribution. 

This approach has been generalized to simple tracking problems where we can completely 
observe the states (i.e., the observation matrix is invertible) in Ruckdeschel (2010a,b). In this 
paper, we consider the general situation, i.e., tracking in situations where observation dimension 
is lower than the one of the states. To this end we propose a suitable generalization of the rLS 
to this tracking problem and discuss its limitations. 

To be able to apply these techniques in the EM algorithm for parameter estimation in 
state space models (see Shumway and Stoffer (1982)) we also generalize our procedures to the 
fixed interval smoothing problem where we come up with a strictly recursive solution as in the 
classical case going backward from the last state to first one. 

As needed in the application, we also make available our techniques for Extended Kalman 
filtering and smoothing, which to the best of our knowledge is novel. 

Organization of the paper Our paper is organized as follows: In the text, the focus is on an 
understanding of the procedures and their behavior at data, so we delegate mathematical results 
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to the appendix. We put some emphasis though on complete specification of the procedures. 

In Section 2, we introduce the general framework of linear, time varying SSMs and their 
extension as to outlier models. In Section 3 we define the filtering and smoothing problems and 
recall the classical optimal solutions, i.e., the Kalman filter and smoother and contrast these 
to our robustifications based on the rLS. This section also introduces the central procedure 
of this paper, the rLS. 10, and gives the necessary generalizations to cover the smoothing and 
the Extended Kalman filter case, and, for comparison, introduces other robust non-parametric 
alternatives. In Section 4, we study their behavior in the ideal situation and at stylized outlier 
patterns for which they are not necessarily designed. In particular we demonstrate that the 
tracking problem at a model with a non-trivial null space of the observation matrix leads to 
problems where, at least at filtering time, for certain outliers, any procedure must fail. Section 5 
presents an application of our procedures in the car industry dealing with GPS problems arising 
there. We sketch three different SSMs used to model this situation, in particular including a 
non-linear one, making robust Extended Kalman filters indispensable. Section 6 provides a 
comparative simulation study in which the outliers are generated according to the ones for 
which the procedures have been defined. Conclusions are gathered in Section 7. 

As to the mathematical details to our procedures, Appendix A.l presents some derivation 
of the optimality of the Kalman filter among all linear filters without any rank-conditions for 
the arising matrices and under more general semi-norms than the Euclidean one. Appendix A. 2 
gives a brief account on the optimality of the rLS, citing relevant results. Finally, Appendix A. 3 
contains the mathematical core of this paper, giving the necessary generalizations in order to 
translate the optimality results from the AO case to the tracking problem. 

2 Setup 

2.1 Ideal model 

We consider a linear SSM consisting in an unobservable p-dimensional state X t evolving accord- 
ing to a possibly time-inhomogeneous vector autoregressive model of order 1 with innovations 
v t and transition matrices F t G M pxp , i.e., 

X t = F t X t _ 1 +v t (2.1) 

a g-dimcnsional observation Y t which is a linear transformation X t involving an additional 
observation error e t and a corresponding observation matrix Z t e R gxp 7 

Y t = Z t X t + e t (2.2) 

In the ideal model we work in a Gaussian context, that is we assume 

K&Qt), et'^Af^Vt), X ~ AA p (a ,Qo), (2.3) 
{X ,v s ,e t , s 7 t € N} stochastically independent (2.4) 

For this paper, we assume the hyper-parameters F t , Z t ,Qt,V t , ao to be known. 

2.2 Deviations from the ideal model 

As announced, these ideal model assumptions for robustness considerations are extended by 
allowing (small) deviations, most prominently generated by outliers. In our notation, suffix 
"id" indicates the ideal setting, "di" the distorting (contaminating) situation, "re" the realistic, 
contaminated situation. 
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AOs and IOs In time series context the most important distinction to be made as to outliers 
is whether they propagate or not. To label these different types, for historical reasons, we use 
the terminology of Fox (1972) (albeit in a somewhat more general sense). Fox distinguishes 
innovation outliers (or IOs) which enter the state layer and hence propagate and additive 
outliers (or AOs) which only affect single observations and do not propagate. Originally AOs 
and IOs denote gross errors affecting the observation errors and the innovations, respectively 
We use these terms in a wider sense: IOs also may cover level shifts or linear trends which would 
not be included in the original definition. Similarly AO here will denote general exogenous 
outliers which do not propagate. 

More specifically, our procedures rLS.AO and rLS.IO defined below both assume substitu- 
tive outliers, but should also provide protection to some extent to arbitrary outliers of (wide- 
sense) AO respectively 10 type. To be precise, rLS.AO assumes a substitutive outlier (SO) 
model already used by Birmiwal and Shen (1993) and Birmiwal and Papantoni-Kazakos (1994), 
i.e., 

y ro = (1 - U)Y' d + UY d \ U ~ Bin(l, r) (2.5) 

for SO-contamination radius < r < 1 specifying the size of the corresponding neighborhood, 
and where U is assumed independent of (X, Y id ) and (X, Y di ) as well as 

Y di , X independent (2.6) 

As usual, the contaminating distribution C(Y di ) is arbitrary, unknown and uncontrollable. 
Similarly rLS.IO assumes that (2.1) is split up into two steps, 

X t = F t X?_ 1 +v?, X? = (l-U t )X t + U t X«, Y? = Z t X? + (1 - U t )ef (2.7) 

where X I t °_ l is the state according to the contaminated past, vf and ef are an ideally distributed 
innovation respectively observation error, and U t and A t dl are defined in analogy to Ut and y dl 
(i.e., with independence from all ideal distributions and the past). 

Different and competing goals induced by AOs and IOs Due to their different nature, 
as a rule, a different reaction in the presence of IOs and AOs is required. As AOs are exogenous, 
we would like to ignore them as far as possible, damping their effect, while when there are IOs, 
something has happened in the system, so the usual goal will be to detect these structural 
changes as fast as possible. 

A situation where both AOs and IOs may occur is more difficult, as we cannot distinguish 
10 from AO type immediately after a suspicious observation; it will be treated elsewhere. 

Other deviation patterns Of course the two substitutive outlier types of Section 2.2 are 
by no means exhaustive: Trends and level shifts are not directly covered, neither are patterns 
where the outlier mechanism may know about the past like the patchy outliers described in 
Martin and Yohai (1986). Another type of outliers arc outliers in the oscillation behavior, in 
both amplitude/scale and frequency, or more generally spectral outliers as arising in robustly 
estimating spectral density functions, compare Franke (1985); Franke and Poor (1984), and 
Spangl (2008). We try to account at least for some of them in Section 4. 

3 Kalman filter and smoother and robust alternatives 

Filter problem The most important problem in SSM formulation is the reconstruction of 
the unobservable states X t by means of the observations Y t . For abbreviation let us denote 

Y\-.t — (Yi, . . . , Y t ), y 1:O :=0 (3.1) 
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Using MSE risk, the optimal reconstruction is the solution to 



E \X t - f t 
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ft measurable w.r.t. cr(Y 1:s ) 



(3.2) 



mm /( 



We focus on filtering (s = t) in this paper, while s < t makes for a prediction, and s > t for a 
smoothing problem. 

3.1 Classical method: Kalman filter and smoother 

The general solution to (3.2), the corresponding conditional expectation E[A t |Yi :s ] usually is 
rather expensive to compute. Hence as in the Gauss-Markov setting, restriction to linear filters 
is a common way out. In this context, Kalman (I960) introduced a recursive scheme to compute 
this optimal linear filter reproduced here for later reference: 

Initialization: X \ Q = a , £q|o = Qo (3-3) 



where T, t \t = Cov(X t — X t \ t ), £t|t-i = Cov(X t — X t \ t -i) 7 and K t is the so-called Kalman gain. 

The Kalman filter has a clear-cut structure with an initialization, a prediction, and a cor- 
rection step. Evaluation and interpretation is easy, as all steps are linear. The strict recursivity 
/ Markovian structure of the state equation allows one to concentrate all information from the 
past useful for the future in X t \ t -i- 

This linearity is also the reason for its non-robustness, as observations y enter unbounded into 
the correction step. A good robustification has to be bounded in the observations, otherwise 
preserving the advantages of the Kalman filer as far as possible. 

3.2 A robustification of the least squares solution (rLS) 

The idea of the procedures we discuss in this paper are based on robustifying recursive Least 
.Squares: rLS, a filter originally introduced for AOs only, compare Ruckdeschel (2000, 2001). 
Starting from a different route of translating regression M estimators to SSM context, Boncelet 
and Dickinson (1983, 1987); Cipra and Romera (1991) arrive at similar weight functions, albeit 
with an iterative procedure to solve for the M equations. More recently, restricted to the 
particular SSM class of Holt- Winter-Forecasting, Gelper, Fried and Croux (2010) take up 
this idea and use regression M estimators, which, in addition to the present rLS approach 
involve recursive scale estimation. The closest recent approach stems from Cipra and Hanzak 
(2011), who — without translating their clipping to a bias side condition for an explicit outlier 
model — show a Lemma-5-type optimality (see Problem (A. 8) below) for their procedure (with 
application to the special SSM case of exponential smoothing). Their solution is different from 
ours only in the choice of the norm used for clipping: They use a diagonal weighting matrix W 
whereas, in the AO case, we use no weighting; a weighting does appear though in the general 
IO solution, compare Lemma A.l below. 

In this paper, we also consider an IO-robust version of this concept, and hence here, in 
addition to the original definition, we append the suffixes "AO" and "JO" to distinguish the 
two versions. Let us begin with (wide-sense) AOs. 

With only AOs, there is no need for robustification in the initialization and prediction 
step, as no (new) observations enter. As introduced in Ruckdeschel (2000), we robustify the 



Prediction: 




F t S t _i| t _iJ7 + Q t 

Yt — Z t Xt\t-i, 

(I p - K t Z t )T, t \ t _-L, 



(3.4) 



Correction: 



(3.5) 
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correction step, replacing KAY by a Huberization Hb(KAY) where Hb(x) = xmm{l,b/\x\} 
for some suitably chosen clipping height b and some suitably chosen norm, natural candidates 
being Euclidean and Mahalanobis norm. It turns out that only little is gained if we account for 
this modification in the recursion for the filter covariance, so we leave this unchanged. That is, 
the only modification in the correction step becomes 

X t \ t =X Mt _ 1 +H b {K t *Y t ) (3.6) 

While this is a bounded substitute for the correction step in the classical Kalman filter, it still 
remains reasonably simple, is non iterative and hence especially useful for online-purposes. 

However it should be noted that, departing from the Kalman filter and at the same time 
insisting on strict rccursivity, we possibly exclude "better" non-recursive procedures. These 
procedures on the other hand would be much more expensive to compute. 

As sketched in Appendix A. 2, it can be shown that rLS not only is plausible, but also has 
some optimality properties. These are not the focus of this paper, though, and proofs of these 
properties appear elsewhere. 

Choice of the clipping height b For the choice of b, we have two proposals. Both are 
based on the simplifying assumption that E id [AA|Ay] is linear, which in fact turns out to 
only be approximately correct. The first one chooses b — b(S) according to an Anscombe 
(1960) criterion, 

E id | AX - H b {KAY)\ 2 = (1 + S) E id \AX - K AY \ 2 (3.7) 

where 5 may be interpreted as "insurance premium" to be paid in terms of efficiency. 

The second criterion uses the radius r e [0, 1] of the neighborhood U so (r) (defined in (A. 6)) 
and determines b — b(r) such that 

{l-r)E id {\KAY\-b)+=rb (3.8) 

This produces the minimax-MSE procedure for U so (r) (compare Section A. 2). Generalizing 
ideas of Ricder, Kohl and Ruckdeschel (2008), this criterion can be extended to situations 
where we only know that the radius lies in some interval, but we do not work this out here; for 
details see Ruckdeschel (2010b). 

Remark 3.1. It turns out that in filtering context, the second criterion reflects much better 
the problem inherent difficulty of a robustification: While 10% efficiency loss in the ideal model 
can be unreachable, because totally ignoring the new observation AY would only "cost" 5%, it 
may be much too little to produce a sizeable effect in other models. A radius of 0.1 however 
seems to lead to reasonable choices of b in most models. 

3.3 rLS.IO 

As noted, in the presence of IOs, we want to follow an IO outlier as fast as possible. 

Optimality results on distributional neighborhoods for the IO-case with the goal of a faster 
tracking to the best of our knowledge have not been considered by other authors so far. 

The Kalman filter in this situation does not behave as bad as in the AO situation, but still 
tends to be too inert. To improve upon this, let us first simplify our model to the situation 
where we have an unobservable but interesting state X <~ P x (dx) and where instead of X we 
rather observe the sum 

Y = X + e (3.9) 
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This equation reveals a useful symmetry of X and e: Apparently 

E[X\Y] = Y — E[e\Y] (3.10) 

Hence we follow Y more closely if we damp estimation of e, for which we use the rLS-filcr. 
We should note that doing so, we rely on "clean", i.e., ideally distributed errors e. With the 
obvious replacements, our optimality results from the appendix translate word by word to a 
corresponding results for IOs, compare Ruckdeschel (2010a, Thm. 4.1). 

In analogy to the definition of the rLS in equation (3.6), we set up an IO-robust version 
of the rLS as follows: We retain the initialization and prediction step of the classical Kalman 
filter and for the correction step, as proved in Corollary A. 4, we note that in the ideal model, 
as E[e t |AYt] = (I g — Z t K t )AY t , the correction step can also be written as 

Xt\t = Xt\t-i + Zf(AY t - E[e t \AY t }), (3.11) 

where Zf is a suitably generalized inverse for Z t , minimizing the respective MSE among all 
E[e|Ay]-measurable (and square integrable) functions, i.e., 

Zf := S t | t _i^ t T (ZJE t | t _iZ t )~ (3-12) 

The 10 robustification then simply consists in replacing E[e t |AYt] by H b ((I q — Z t K t )AY t ), i.e., 
the rLS. 10 correction step as 

X t \t = X t \ t -i + Z?[AY t - H b ((I q - Z t K t )AY t )] (3.13) 

Here the same arguments for the choice of the norm and the clipping height apply as for the 
AO-robust version of the rLS. 

To better distinguish IO- and AO-robust filters, let us call the IO-robust version rLS. 10 
and (for distinction) the AO-robust filter rLS. AO in the sequel. 

Remark 3.2. It is worth noting that also our IO-robust version is a filter, hence does not use 
information of observations made after the state to reconstruct; rLS. 10 is strictly recursive and 
non iterative, hence well-suited for online applications. 

3.4 Extended Kalman filter 

In the application we are heading for, in Model (M3) below, the SSM given by equations (2.1) 
and (2.2) is only a linearization of a nonlinear SSM given by 

X t = f t (X t _ 1 ,u t ,v t ) (3.14) 
Y t = z t (X t ,w t ,e t ) (3.15) 



for smooth, known functions ft and z t in the states X t , in the innovations Vt, in the observation 
errors e t and some user defined controls u t and w t . With v t = Ewj, e t — Ee t , this is linearized 
to give the following Extended Kalman filter taken from Wan and van der Merwe (2002) 



Initialization: 




= a , 


^0|0 




(3.16) 


Prediction: 


X t \t-i 


= ft(X t -i\t-i,u t ,v t ), 


s t lt_l 


= F t Y<t-i\t-iFf + B t Q t Bl 


(3.17) 




for F t 


= 9_f t ( Xl u t ,V t )\ Xt _ iitil 


B t 


= £jft{X t -i\t-i,u t ,v)\ Vt 




Correction: 


X t \t 


= X t \ t _ 1 +K t AY t , 


AY t 


= Y t - zt{x t \ t _i,w t ,e t ), 






K t 


= ^t\t-\ZlC t , 


S t | t 


= (Hp - K t Z t )Y, t \ t _i, 






c t 


= Z t Z tlt _ 1 ZJ + D t V t DJ, 






(3.18) 




for Z t 


= &Zt(x,Ut,V t )\ x , 


D t 


= ^^(^t|t-i,«t,£)L- t 
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As to the robustification of this algorithm by rLS.AO and rLS.IO, as in the linear case, we 
simply replace the term K t AY t by H b (K t AY t ) in the AO case and by Zf(AY t - H b ((I q - 
Z t K t )AY t ) in the 10 case. 

3.5 A robust smoother 

In many situations, in particular for an application of the EM-algorithm to estimate the hyper- 
parameters, it is common use to enhance the filtered values X t i t in retrospective, accounting 
for the information of Y t +i-.T which is available in the mean time. To retain recursivity, we 
use a corresponding backward recursion as to be found in Anderson and Moore (1990, Sec. 7. 4, 
(4.5)): 

Xt\T = X t \ t + Jt(Xt+i\T — Xt+i\t)i Jt = ^t\tFt^t+i\t (3.19) 
with respective update for the smoothing covariance, 

S 4 |t = S t | t + J t (E t+1 | T - E t+ i| t ) J t r (3.20) 

Writing (3.19) as X t \ T ~ X t \ t = J t [(X t+1]T - X t+1]t+1 ) + (X t+1]t+1 - X t+1]t )}, we see that 
the first summand in the brackets of the RHS is just the preceding iterate of the LHS in the 
recursion and the second summand is just the already robustified increment of the correction 
step in the filter. Hence, sticking to our outlier models for 10 and AO contamination which 
independently affect single A t 's and Y t 's the modification only has to be done in the (new) 
treatment of AY t , i.e., in the second summand, so there is no further need for robustification 
in the backwards loop. 

3.6 A nonparametric approach: median based filters 

A nonparametric approach to assess the robustness issues met with Kalman filtering, which does 
not use any state space formulation is to use median-type filters for this purpose. Standard 
median filters remove outliers and preserve level shifts but their output does not properly 
represent linear trends. The repeated median (RM) has been developed for the extraction of 
monotonic trends with intercept and slope from a time series. 

In order to combine advantages of several specifications of location and regression based 
median-type filters, a variety of hybrid filters has been introduced in Fried, Bernholt and 
Gather (2006). From these filters, in Section 6, we use the regression based and predictive 
approach PRMH. In addition, we use a location based approach MMH for smoothing (notation MMH 
taken over from R package robf ilter, Fried and Schcttlingcr (2010)). 

Because of their high breakdown points for window width k of ([k/2\ + l)/n for PRMH and 
(2[fc/2j + l)/n for MMH, see Fried, Bernholt and Gather (2006), these hybrid RM-type filters 
and smoothers arc prominent candidates for a first sweep over the data. 

For an automatic choice of the window width, the Adaptive On-line .Repeated Median 
.Regression (ADORE) filter (Schcttlingcr, 2009) provides robust on-line extraction by a moving 
window technique with adaptive window width selection. 

In our setting, the median-based filters as they stand can only be applied to situations with 
one-dimensional observations; generalizations to higher dimensions beyond mere coordinate- 
wise treatment have been introduced in Schettlinger (2009), but are not dealt with in this 
paper. Being non-parametric, these filters know nothing about an underlying SSM, so cannot 
be used for state reconstruction, except for the case that Z t = 1 (or for p = q, I q in higher 
dimensions). So we include these filters only in the steady state examples of Sections 4.2, 4.3 
and 6 (Model (SimB)). 
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Figure 1: Typical results of the state-space model for different situations: left no contamination, 
middle 10 contamination, right AO contamination 

4 Behavior of the filters at stylized outlier situations 

In this section, we study the behavior of our filters in the ideal situation and at stylized outlier 
patterns for which they are not necessarily designed — these cover AOs and IOs in the original 
sense, their behavior at trends and level shifts and at changes in the oscillation behavior. We 
also look at the effects of a non-trivial null space of the observation matrix such that the 
information given by the observation does not suffice to reconstruct the state. 

4.1 The ideal situation, spiky outliers and IOs 

To start with, we study the behavior of the robustified (extended) Kalman filter (EKF) in three 
different situations, namely no contamination, contamination by (classical) IOs and AOs, and 
use the following hyper parameters for the SSM: 

/ 20 \ f 0\ 

a ° - I o ) ■' Qo ~ \ o o J ' (4 r 

Ft = ( ) ' Zt = ( -0 3 3 1 ) ' Qt = ( 9 ) ' Vt = ( 9 ) ■ 

We note that the first coordinate of the above state process is a random walk and therefore 
non-stationary, whereas the second coordinate is just white noise. 

The innovations Vt and the errors e* of the observation process are simulated from a con- 
taminated bivariate normal distribution 

CAA 2 (r,0,i?, Mc ,i?c) = (l-r)AA 2 (0,i?) + rAA 2 ( Mc ,i? c ) , (4.2) 

with r, the amount of contamination, set to 10% and where R — Q t in case of the innovations 
Vt and R = Vt in case of the observation errors £t, and the moments of the contaminating 
distribution are given by 

Hi = (25, 30), R c = diag(0.9, 0.9) . (4.3) 

Then the bivariate filter estimates of the three simulated state-space models are computed 
using the classical Kalman and rLS filters. In Figure 1, we show typical realizations of the state 
process together with their filter estimates for different contamination situations. Only the first 
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coordinate of the estimated state vector is plotted. The thick black line is the true state process, 
while classical Kalman, rLS.IO, and rLS.AO filters are plotted as light red line, dotted green 
line, and dot-dashed blue line, respectively. We note that contamination by additive outliers 
cannot be seen directly in Figure 1 because only the observation equation is affected. However, 
the spikes, especially visible for the filter estimate of the classical Kalman filter, do indicate 
additive outliers. 



Ideal situation: The classical Kalman filter as well as its robustified versions, yield almost 
the same results. 



IO contamination: The rLS.IO filter is able to follow the true state almost immediately, 
whereas the classical Kalman filter is only able to track the true state with a certain delay. 

Spiky outliers: The rLS.AO filter is not affected by additive outliers whereas the classical 
Kalman filter is prone to them. 



4.2 Changes in oscillation pattern and level shifts 

Next, in situations with outliers scaling on different frequencies, we compare our robustified 
versions of the Kalman filter to a non-parametric filtering method, i.e., the ADORE filter 
proposed by Schettlinger (2009) — with an automatic selection of the window width. As outlier 
situations, we consider (classical) IOs and AOs (with contamination radius r = 10% each) 
as well as the special endogenous case where we substitute part of the state by a completely 
artificial signal. For the SSM we use an AR(2) process, i.e., an autoregressive process of order 
2, as state process, and the following hyper parameters: 



a o — n ' Qo = 









F t = ( 1 r ) , Z t = ( 1 ) , Q t = ( Q I ) , V t = ( 1 ) 



1 



(4.4) 



The innovations v t in the IO situation again stem from a contaminated bivariate normal dis- 
tribution given in (4.2) with moments of the contaminating distribution given by 

/£ = (30,0), Rc=(°q o ) • ( 4 - 5 ) 

The contaminating distribution of the errors e t of the observation process is A/"(10, 0.1). 
Then, for all different situations the filter estimates are computed using the rLS filter and the 
ADORE filter. Moreover, also the classical Kalman filter estimates were calculated. 

In Figure 2, the realizations of the state process, together with their filter estimates are 
displayed for different situations of contamination. The black line is again the true state process, 
while classical Kalman, rLS, and ADORE filters are plotted by a red line, a dashed green line, 
and a dot-dashed blue line, respectively. 



Endogenous contamination by artificial signal: In this case, whole parts of the state 
process are substituted by an artificial signal, more specifically by a so called block signal 
(cf., e.g., Donoho and Johnstone, 1994) which is piecewise constant with random length and 
amplitude. Here, the rLS.IO filter is able to track the state process best. The classical Kalman 
filter, as already seen in the previous section, is not able to follow the level shifts. Using the 
ADORE filter as non-parametric filtering method an obvious time delay in following the state 
signal can be seen. 
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Figure 2: Results of the simulated state-space model for different situations: left contamination 
by artifical signal, middle 10 contamination, right AO contamination 



IO contamination: Here, the rLS.IO filter is able to follow the true state, whereas the 
classical Kalman filter fails to track the spikes of the state signal. The ADORE filter as non- 
parametric alternative only fits an overall trend. In general, non-parametric filters as ADORE 
are specialized to reveal trends, trend changes or shifts of an underlying, possibly non-stationary 
signal in the presence of outliers and, according to our experience, they extremely smooth the 
underlying process. 



Spiky outliers: The rLS.AO filter is not affected by AOs whereas the classical Kalman filter 
is prone to them. The ADORE filter again shows the above mentioned properties and estimates 
more or less an overall trend of the underlying AR(2) process. 



4.3 Coping with non observed aspects 

One important aspect of reconstructing states in SSMs is observability, compare Anderson 
and Moore (1990, App. C), as in general matrix Z t will have a non-trivial null space with 
the consequence that state signals falling to this null space are not visible at filtering time. 
Smoothing may to some extent relieve this problem with a certain time delay, when subsequent 
transitions F s move the states in such a way that they become visible to Z s at a later stage. 

We illustrate this problem studying how the considered versions of the Kalman filter cope 
with such non observed aspects in the following setup: 

T= :■-)(! . F = | 1 1 ] , Z= ( I ° J ) . () ■ dkmO. I). n.iKll) 




1 J ' ^ 6V",",— J , ^ g) 



V = diag(0.1, 0.001) , a = (0,0,0)', Q = diag(l, 0.1, 0.001) . 

Here, we use as outlier specification the ones from Section 2.2 with r IO = r AO = 0.1, i.e., (2.5) 
and (2.6) in the AO case and (2.7) in the IO case. As contaminating distributions we chose 
A t di ~ multiv.Cauchy(0, Q) and Y t di ~ Cauchy/1000 (using R packages mvtnorm (Genz et al., 
2011) and MASS (Venables and Ripley, 2002)). 

In Figure 3, for different versions of the rLS, we display one realization of the state process 
in the ideal and in the real IO-contaminated situation together with the filter and smoother 
estimates. The black line is the true state process in the ideal situation, while the red line 
represents the IO-contaminated state process, i.e., the real situation. The dashed green line 
represents the rLS filter, the dot-dashed blue line the corresponding rLS smoother. 
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Figure 3: Filter estimates of the simulated state-space model using different filter and 
smoothers: left classical Kalman filter and smoother, middle IO-robust filter, right AO-robust 
filter 

Only the second coordinate, which lies in kcr Z t , of the state process is plotted. 

All three plots clearly reveal that none of the proposed filters is able to cope with this 
situation, i.e., to correctly follow the level shift in the second coordinate of the state process 
caused by an IO-contamination. 

5 Application 

In this section, we show an application of the procedures described above to data from a 
cooperation with the department for Mathematical Methods in Dynamics and Durability at 
Fraunhofer ITWM and Kaiserslautern University, in particular with Nikolaus Ruf and Jiirgen 
Franke. This data is part of a larger project they are involved. In the application, we deal with 
data taken from a vehicle moving on some track, which consists of four data channels, i.e., 

• time recorded in seconds ([s]); 

• vehicle speed spt, measured in meters per second ([m/s]); 

• measured altitude h t {[m]); 

• pitch angle speed at, measured in radians per second {[rad/s]). 

We are interested in slope estimation, more precisely in the change of altitude over distance, 
because this information cannot be captured so easily from cartographic information. 

The original data has been distorted by different factors, e.g. GPS measurement error, 
discreteness of the measurement process, change of the road surface, etc. If during the mea- 
surement the process signal was lost for short time because of tunnels on the way or for some 
other reasons, some part of the data is missing and we can observe e.g. sudden jumps of the 
altitude or speed, what is impossible in practice for the tracks. Such situations are very com- 
mon in reality, therefore we are interested in constructing reliable parametric models of such 
errors in order to reconstruct the data. 

With different simplifying assumptions on the observed data — to some degree for illustration 
purpose — we construct three models: 
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A linear time-invariant model with the following state vector and matrix: 









1 











1 


I) 














X t =\ ht \ F= 1 1 w t ~iV3(0,Q), g = diag(0, 0,0.01), 
V c* / V / 

where Ct = sp t +ia t , t = 0, T— 1 denotes the compound increment measured in ([m/s] x 
[rad/s]). Note that by construction, ct is not defined, therefore we additionally assume 
that ct = spxctT- 

The hyper-parameters of the observation equation in this model are the following 



Y t = I ~* J Z =[o o i ) £ * ~ ^a(0, V), V = diag(5, 0.01). 

In reality, the variances have been obtained by an application of an EM-type algorithm 
as the one by Shumway and Stoffer (1982). We mention that this EM- Algorithm is all but 
robust, being based on classical estimators for first and second moments. This robustness 
issue will be discussed elsewhere in more detail. 

In our application, though, we inspected the norms of the observation residuals AY t and 
the estimated innovations it and errors it which were obtained from the rLS-filters and 
-smoothers applied to the real data. Among these obtained ||Alf||, \\vt\\, ||e t ||, we did 
not observe outstanding values, which indicates that application of the classical EM- 
Algorithm in this case should be justified. 

The initial distribution of the state vector here is defined as follows 
a = (^1,0,0)', Q = diag(5, 1,0.01). 



In Figure 4, we plot the actual observations together with the respective reconstruction 
by means of the classical Kalman filter, the rLS.AO, and the rLS.IO in Models (Ml) and 
(M2) defined below, with a zoom-out of observations 100-130 to better distinguish the 
different models. The classical reconstruction is not clearly visible at the plot, since in 
case of these data, it almost coincides with the rLS.AO. 
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ure 4: Observations and reconstructed values for different filters at Models (Ml) and (M2) 

Other data sets we analyzed showed much stronger evidence for outliers, but for confiden- 
tiality reasons, we limit ourselves to this data set and instead discuss the behavior of our 
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procedures at outliers in generated data. We may mention though, that our procedures 
work well in the discussed problems. 



(M2) A linear time-varying model with the following state hyper-parameters: 

v t ~N 3 (0,Q), Q = diag(0, 0,0.05). 



X f = 




Ft = 



1 sp t At 
1 At 




Note that to complete the derivation of the model, we need to define a state matrix for 
t = 0, and since we do not have the vehicle speed at time t = we set it to the first 
observation spi, i.e. 

/ 1 s Pl At 
F = 1 At 
\ 

The hyper-parameters of the observation equation in this model are the following: 



Z = 



1 
1 



e t ~N 2 (0,V), y = diag(5, 0.005). 



The initial distribution of the state vector is again defined as follows 
a = (h u 0, 0)', Q = diag(5, 0.005, 0.005). 

(M3) A quadratic time-invariant model accounting for acceleration. This gives a nonlinear 
SSM in the notation of Section 3.4, specified as 

f t (X t _ u u t ,v t ) = AiXt-ttoXt-J + BXt-t+vt, 

z t (X t ,w t ,e t ) = ZX t + e t 

Here ® denotes the Kronecker product, i.e., the quadratic term can be written in the 
following form 

p 

A{X t ®X t )=Y,AXt]iX u fc = 0,..,T-l 
i=i 

with known p x p - matrix A = (A 1 \ ... | A p ). 

The hyper-parameters, states and observations of constructed for this application 
quadratic time-invariant model are the following 



X t 



( ht \ 

spt 
sp t 
at 
\ a t J 



( 



A 



\ 



->5x5 



J5x5 



J5x5 



^5x5 



V 



B 



( 1 \ 

1 At 



1 At 

\ / 



At 











v t ~ N 5 (0, Q), Q = diag(0, 0, 2, 0, 0.005), 
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Y, 



( b\ 

sp t 
s'Pt 
V ~at J 



( 1 \ 

10 

10 

V o o o o i J 



e t ~N 4 (0,V), ^ = diag(5,2,2,0.005). 



The initial distribution ol the state vector here is defined as follows 

a = (fti,spi, 0,0,0)', Q = diag(5,2,2,0.005,0.005). 



6 Simulation 

To see how our procedures (and some competitors) work in the outlier setting they are con- 
structed for, we produce an simulation study done in R, R Development Core Team (2012), 
in the framework of our package robKalman developed at http://r-forge.r-project.org/ 
pro j ects/robkalman/. 

For these simulations, we generated 10000 runs of data for two different models: 
Model (SimA), a one-dimensional time-invariant steady-state model with parameters 
F = Z = Q = V = 1, ao = 1 and time horizon T = 50 for comparison with the 
median-based filters of Section 3.6. More specifically, we use function hybrid. filter from 
R package robfilter, with specifications PRMH and MMH denoted by hybf and hybs below, 
respectively (for hybrid filter / smoother). Both procedures are used with fixed window width 
5 and minimum number of non- missing observations 2, reflecting the actual outlier situation. 

Second, we study Model (SimB) which takes over dimensions and typical parameters from 
the application of Section 5 as specified in detail in (4.6) and already used in Section 4.3. 

As outlier specification, in both models we use the ones from Section 2.2 with v io — ^*ao — 
0.1, i.e., (2.5) and (2.6) in the AO case and (2.7) in the IO case. As contaminating distributions 
we chose Cauchy for Xf <~ Cauchy(— 10, 1) and Y t di ~ Cauchy(5, 1) in Model (SimA), while 
in Model (SimB) we used Af - multiv.Cauchy(0, Q) and Y t dl ~ Cauchy (0, 1/1000) (using R 
packages mvtnorm (Genz et al., 2011) and MASS (Venables and Ripley, 2002)). 

The results for Model (SimA) are summarized in Table 1 and Figures 5-8, the ones for 
Model (SimB) in Table 2 and Figures 9-12. In the annotation we denote the smoother versions 
of rLS.AO and rLS.IO by SrLS.AO and SrLS.IO, respectively. 

In the tables, we display the empirical mean squared error (MSE) for each of the procedures 
at each situation. It is clearly visible that each filter does the job well it is made for: The classical 
Kalman filter is best in the ideal situation with rLS.IO and (a little less so) rLS.AO still close 
by. In the AO situation, the AO-robust filter is best (also compared to the median-type filters 
which comes second best), whereas classical Kalman, and, even worse rLS.IO, have problems. 
In the IO situation the situation changes dramatically: rLS.IO excels, whereas the classical 
Kalman filter shows its inertia, and, much worse, rLS.AO and hybf are not at all able to track 
these abrupt signal changes. For smoothing the situation is a bit different: At large we have the 
same picture as for filtering, however the IO-robust smoother is not doing a good job at all: 
the filter is much better here. This remains to be studied in more detail. In the ideal and AO 
situation however the smoothers do improve the filter (as they should). So it seems a strictly 
recursive smoother like the one we propose is well capable to deal with spiky outliers but much 
less so to track abrupt changes. 

As to the graphics, in Figures 5 and 9, we display the coordinatewise distribution of the 
reconstruction errors AA 35 = A 35 | 35 — X 35 for filtering and AA 35 = A 35 | 50 — A 35 for smoothing. 
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The columns of the panels are the situations (ideal, AO, and 10), the rows (for Model (SimB)) 
the state coordinates. In each panel, we display the filters first and then the smoothers, ordered 
as in the columns of the tables. In order to keep the boxes of the boxplots distinguishable, we 
skipped all outliers outside (—15, 15) in Model (SimA) and 8 times the coordinate- wise maximal 
interquartile range in Model (SimB). In addition to the tables of the MSE, these figures reveal 
that also in terms of the bulk of the data there are differences among the procedures: In Figure 9 
for Model (SimB), coordinate 1 is hardest to reconstruct, and shows large differences between 
smoother and filter. At least for coordinates 1 and 2, the central boxes are definitively smallest 
for the rLS.AO filter and its smoother in the AO situation, with even a large improvement by 
the smoother. The same goes for the rLS.IO filter in the 10 situation, whereas as already noted 
in the tables, the 10 smoother is less convincing in the 10 situation, although not really worse 
than the filter in terms of the central box, although coordinate 3 tells a different story here. 

In Figure 5 for Model (SimA), we scaled all panels to the same coordinates in order to be 
better able to compare the situations; otherwise the conclusions to be drawn parallel the ones 
for Model (SimA), except that in positions 4 an 8 in each panel we display the median type 
filters and smoother which lead to largest central boxes in the filters, and in the smoothers are 
in between the specialized robust smoother and the respective unsuitably robustificd smoother. 

The remaining figures display the distribution of the normed reconstruction errors, where in 
spite of Proposition A. 5 below, we have not changed the norm in the 10 situation. To visualize 
all outliers this time, we use a logarithmic scale for the y-axis which also emphasizes "inkers" 
(compare Hampcl et al. (1986, p. 140)), i.e., situations where the procedures behave extra- 
ordinarily well. Per se, for reconstruction, this is of course a beneficiary situation, but, in case 
inference is also of interest, this will lead to underestimation of the true variation. 

In Model (SimA) in the ideal and 10 situation, the median-type filters and smoothers excel 
in this direction, but also the rLS filters both perform better than the classical filter in this 
respect. For the smoother, the classical one is best, though. To the upper tail, the norms even 
for the worst situations surpasses the central box only by little (on log-scale, though). In the 
AO situation the specialized robust filters and smoothers have the shortest tail but as to the 
boxes are hardly distinguishable from the classical ones, while in the 10 situation only the 
rLS.IO has a convincing tail. 

In Model (SimB), in the ideal situation, the smoothers are clearly better than the filters but 
much less so in the outlier situations (although the AO smoother is significantly better than 
the filter). We also see that the classical procedures produce more inliers than the robustificd 
ones in the ideal and 10 situation, while in the AO situation no clear statement can be made. 



situation 




filter 






smoother 






Kalman 


rLS.IO rLS.AO 


hybf 


Kalman 


rLS.IO rLS.AO 


hybs 


id 


0.628 


0.679 0.808 


2.180 


0.447 


0.475 0.575 


0.939 


AO 


24.206 


30.379 1.446 


4.950 


14.087 


14.226 1.097 


1.475 


10 


30.175 


0.729 1850.977 


629.129 


66.538 


95.579 1846.870 


623.371 




Table 1 


empirical MSEs in 


Model (SimA) at t = 


35 for T = 50 





7 Discussion and conclusion 

Contribution In this paper we have presented robustifications of the Kalman filter and 
smoothers specialized either for damping of spiky outliers or for faster tracking a deviated series 
where the smoothers and the general IO-robust filter are new to the best of our knowledge. 
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IAXI 2 at Ideal Situation 




Figure 5: ||AA t || 2 in Model (SimA) at t = 35 for T = 50 in ideal situation; filtered (boxes 1-4) 
and smoothed (T = 50; boxes 5-8) versions 



IAXI 2 at AO Situation 




Figure 6: ||AA t || 2 in Model (SimA) at t = 35 for T = 50 in AO situation; filtered (boxes 1-4) 
and smoothed (T = 50; boxes 5-8) versions 

All our procedures are recursive, hence extra-ordinarily fast, so can be used in online 
problems and can easily be used to also robustify the Extended Kalman filter for non-linear 
state-space models. Our procedures are implemented to R, developed under r-forge, http: 
//r-f orge.r-project.org/ and soon will be submitted to CRAN, http://cran.r-project. 
org/. 

We have demonstrated the superior behavior of our procedures at the outlier situations 
they are made for and, in a study of stylized "realistic" outlier situations we showed that they 
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IAXI Z at 10 Situation 



hybf 



hybs 



Figure 7: ||AAJ 2 in Model (SimA) at t = 35 for T = 50 in 10 situation; filtered (boxes 1-4) 
and smoothed (T = 50; boxes 5-8) versions 



X at Ideal Situation 



AX at AO Situation 



6^ 



IB' 



Figure 8: AX t in Model (SimA) at t — 35 in all dimensions and in different situations; filtered 
(boxes 1-4) and smoothed (T = 50; boxes 5-8) versions 



situation 




filter 






smoother 






Kalman 


rLS.IO 


rLS.AO 


Kalman 


rLS.IO 


rLS.AO 


id 


0.034 


0.047 


1.725 


0.011 


0.013 


1.514 


AO 


1995.207 


7676.445 


5.661 


5520.957 


32270.199 


5.262 


IO 


175.484 


3.553 


7515.495 


116.626 


86.502 


7513.354 



Table 2: empirical MSEs in Model (SimB) at t = 35 for T = 50 
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lAXr at Ideal Situation 



Figure 9: \\AX t \\ 2 in Model (SimB) at t = 35 for T = 50 in ideal situation; filtered (boxes 1-3) 
and smoothed (T = 50; boxes 4-6) versions 



lAXr at AO Situation 



Figure 10: \\AX t \\ 2 in Model (SimB) at t = 35 for T = 50 in AO situation; filtered (boxes 1-3) 
and smoothed (T = 50; boxes 4-6) versions 



also can cover with a wider variety of outlier situations. These stylized situations also helped 
to identify certain flaws of the procedures. 

In particular the new rLS.AO smoother and the rLS.IO filter seem to be recommendable 
already as they are. As we have seen in the evaluations, further research is needed though to 
improve the IO-robust smoother which could not convince so far. 

As to the theoretical contributions, for the tracking problem we have thoroughly settled 
the case of non invertible observation matrices, i.e.; the situation when certain directions of the 
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IAXI Z at 10 Situation 



E 



KF rLS.IO rLS.AO KS SrLS.IO SrLS.AO 



Figure 11: \\AX t \\ 2 in Model (SimB) at t = 35 for T = 50 in 10 situation; filtered (boxes 1-3) 
and smoothed (T = 50; boxes 4-6) versions 

state are not visible. It turned out that the taken passage to observation space in order to define 
the rLS.IO involves no extra costs in terms of rank defects compared to the classical Kalman 
filter. In order to have a well-posed problem though, as demonstrated in Section 4, we need to 
pass to a semi-norm which ignores directions, no filter can see. With this modified criterion, 
we can establish our 10 robust filter as approximately optimal, where approximately means 
that it would be optimal if the ideal conditional expectation were linear. Now the conditional 
expectation is not exactly linear, but, as shown empirically in specific examples, it is close to 
linear — at least in a central region. 

Outlook In reality "pure" 10 or AO situations hardly occur. Hence we think it is of high 
importance to thoroughly study hybrid versions of our filters (and/or smoothers) combining 
the two types of filters (10 and AO) we have discussed so far, to use them in mixed situations. 
So far we have only checked a heuristics based on the sequence of the normed observation 
residuals ||AF t || in a rolling window. In situation where IOs and AOs are well separated this 
already works decently, but the procedure easily gets confused once in a window we have both 
IOs and AOs. 

In non-linear state space models, the unscented Kalman filter, compare Wan and van der 
Merwe (2002) deserves a robustification, which could easily take up the ideas of this paper. 

Finally, robust filters and smoothers are a key ingredient of the EM algorithm (Shumway 
and Staffer, 1982) to estimate unknown (hyper-) parameters with interesting application to the 
fitting of stochastic differential equations to financial data. 
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Figure 12: AX t in Model (SimB) at t = 35 in all dimensions and in different situations; filtered 
(boxes 1-3) and smoothed (T = 50; boxes 4-6) versions 

A Appendix 

A.l Optimality of the classical Kalman filter 

Optimality of the classical Kalman filter among all linear filters in Z/2-sense and, under normality of 
the error and innovation distributions, among all measurable filters is a well-known fact, compare, e.g. 
Anderson and Moore (1990, Sec. 5.2). As we will need some of the arguments later, let us complement 
this fact by some generalization to arbitrary norms generated by a quadratic form and by a thorough 
treatment of the case of singularities in the covariances arising in the definition of the Kalman gain 
from (3.5). To do so, we take the orthogonal decomposition of the Hilbert into closed linear subspaces 
as lin(F 1: ( t _ 1 )) © lin(AYt) as granted and for X — X t — X t \ t _ l and Y = AY t as given in (3.5), derive 
K t . To this end, for any matrix A let us denote by A~ the generalized inverse of A with the defining 
properties 

A~AA-=A~, AA~A = A, A~ A = (A~ A) T , AA~ — (AA~) T (A.l) 

and for D a positive semi-definite symmetric matrix in K pxp and for x G W define the semi-norm 
generated by D as \\x\\% := x T D~x. 

Lemma A.l. Let p,q G N, P some probability and X G L P 2 {P), Y G L q 2 (P), EX = 0, EY = 0, 
where for some Z G E 9Xp , and some e G L\{P) independent of X, Y = ZX + e. Let D a positive 
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semi-definite symmetric matrix inR pxp . Then 

k = cov{x, y) cov(yy (a.2) 

solves 

E\\X-KY\\ 2 D = mini, K G R pxq (A.3) 

K is unique up to addition of some A G R pxq such that ACovY = and some B G R pxq such that 
DB = 0. If K = DD~K, K has smallest Frobenius norm among all solutions K to (A.3). 

Proof. Denote L P (P,D) the Hilbert space generated by all R p valued random variables U such that 
Ep II^IId < oo— after a passage to equivalence classes of all random variables U, U' such that Ep \\U — 
U'f D = 0. Let S = Cov(X) and V = Cov(e). Then Cov(X, V) = SZ T and Cov(Y) = ZSZ T + V. 
Denote the approximation space {KY \ K G E pX9 } C L%(P,D) by AC. AC is a closed linear subspace 
of L P (P,D), hence by Rudin (1974, Thm. 4.10) there exists a unique minimizer X — KY G AC to 
problem (A.3). It is characterized by 

E(X — KY) T D KY = 0, \/K G R pxq (A.4) 

Plugging in K = De^eJ, {e^}, {ij} canonical bases of W, E 9 , respectively, we see that (A.4) is 
equivalent to 

E7r(A — KY)Y T — <^ tyK Cov(Y) = tt Cov(A, Y ) (A.5) 

where -k — D~ D is the orthogonal projector onto the column space of D. But y G K 9 can only lie in 
kerCov(Y) if y G kerCov(X, Y). Hence indeed K Cov(Y) = Cov(X,Y), and the uniqueness assertion 
is obvious. We write ttd and ttc for the orthogonal projectors to the column spaces of D and Cov(F), 
respectively and 7fc = \ — nc, = I P — ttd for the corresponding complementary projectors. Then we 
see that if = Ktt c and for any A e R pxq with A Cov F = we have A = Atv c and for any B G R pxq 
with Di? = we have B = ttdB; hence 

\\K + A + B\\ 2 = trK T K + 2trA T k + 2trB T K + tr{A + B) T {A + B) 
= \\K\\ 2 + 2tr Kn c n c A T + 2trKn D n D B T + \\A + B\\ 2 
= \\K\\ 2 + \\A + B\\ 2 >\\K\\ 2 with equality iff A + B = 0. 

□ 



A.2 Sketch of the optimality of the rLS.AO 

(One-Step)-optimality of the rLS The rLS filter is optimally-robust in some sense: To see this, 
in a first step we essentially boil down our SSM to (3.9), i.e., we have an unobservable but interesting 
state X ~ P x (dx), where for technical reasons we assume that in the ideal model E \ X\ 2 < oo. Instead 
of X, for some Z G R 9Xp , we rather observe the sum Y = ZX + e of X and a stochastically independent 
error e. As (wide-sense) AO model, we consider the SO outlier of (2.5), (2.6). The corresponding 
neighborhood is defined as 

U so {r) = (J {£(X, Y T °) | Y rc acc. to (2.5) and (2.6) with radius sj (A.6) 

0<s<r 

In this setting we may formulate two typical robust optimization problems, i.e., a minimax formulation, 
and, in the spirit of Hampel (1968, Lemma 5), a formulation where robustness enters as side condition 
on the bias to be fulfilled on the whole neighborhood 

[Minmax-SO] max M E ro \X - f(Y'°)\ 2 = min/ ! (A. 7) 

[Lemma-5] E id \X - f{Y id )\ 2 = min/ ! s.t. sup M | E ro f(Y T °) -EX\ <b (A.8) 

Then one can show that setting D(y) = E id [X|y = y] — EX, the solution to both problems is 
f(y) = E X + H p (D(y)) (with b = p/r in Problem (A.8)), and that this is just the (one-step) rLS, once 
E id LY|F] is linear in Y. A proof to this assertion is given in Ruckdeschel (2010a, Thm. 3.2). 
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Remark A. 2. (a) As mentioned in Section 3.2, Cipra and Hanzak (2011) show an optimality sim- 
ilar to the one for Problem (A. 8), and hence, non- surprisingly come up with a similar procedure. 

(b) The ACM filter by Masreliez and Martin (1977), an early competitor to the rLS, by analogy 
applies Huber (1964) 's minimax variance result to the "random location parameter X " setting of (3.9). 
They come up with redescenders as filter f. Hence the ACM filter is not so much vulnerable in the 
extreme tails but rather where the corresponding ip function takes its maximum in absolute value. Care 
has to be taken, as such "inliers" producing the least favorable situation for the ACM are much harder 
to detect on naive data inspection, in particular in higher dimensions. 

(c) For exact SO-optimality of the rLS-filter, linearity of the ideal conditional expectation is crucial. 
However, one can show that E id [AX|AY] is linear iff AX is normal, but, having used the rLS-filter 
in the AX-past, normality cannot hold, see Ruckdeschel (2010a, Prop.'s 3.4, 3.6). 

(d) Although rLS fails to be SO-optimal for t > 1, it does performs quite well at both simulations 
and real data. To some extent this can be explained by passing to a certain extension of the original 
SO -neighborhoods. For details see (Ruckdeschel, 2010a, Thm. 3.10, Prop. 3.11). 

A.3 Optimality of the rLS.IO 

This section discusses (one-step) optimality of the rLS.IO in some detail. We omit time indices and 
write E for E t i t _i. To start, let us again look at the boiled down model (3.9) where we interchange 
the role of e and X, and note that X — f(Y) = e — g(Y) for f(Y) — Y — g(Y). Hence in this simple 
model, the optimal reconstruction of a corrupted X assuming that e is still from the ideal distribution 
is just Y — g(Y), g(Y) the optimal reconstruction of e in the same situation. 

In notation, let us write oP(a|&) for the best linear reconstruction of a by means of b, i.e., the 
orthogonal projection of a onto the closed linear space generated by b. 

Assuming linear conditional expectations and mutatis mutandis in Ruckdeschel (2010a, Thm. 3.2), 
the optimally-robust reconstruction of e given Y in the sense of Problems (A. 7), (A. 8) is just 
Ht(oP(e\Y)) — with the same caveats as to the optimality for larger time indices as in Remark A. 2. 
But again, oP(ejY) = oP(F - X\Y) = Y — oP(X\Y) so the IO-optimal procedure f lo is 

f 10 {Y) = Y-H b {Y-oP{X\Y)) (A.9) 

Details as to the translation of the contamination neighborhoods and exact formulations of the 
optimality results are given in Ruckdeschel (2010a). 

The general setup with some arbitrary Z G K 9Xp , where Z in general is not invertible, and more- 
over, even Z T Y>Z may be singular, is not trivial, though. For instance, our preceding argument so far 
only covers reconstruction of ZX, but at this stage it is not obvious how to optimally derive a recon- 
struction of X from this. In particular, in this general case, there are directions which our (robustified) 
reconstruction cannot see — at least all directions in ker Z. So an unbounded criterion like MSE would 
play havoc once unbounded contamination happens in these directions. So in this context, the best 
we can do is optimally reconstructing ZX on the whole neighborhood generated by outliers in X and 
then, in a second step, for this best reconstruction of ZX, find the best back-transform to X in the 
ideal model setting. The question is how much we loose by this. To this end, note that 

oP(e|Y) = oP(Y - ZX\Y) = Y — ZoP(X\Y) = (I, - ZK)Y (A.10) 

For Z E from (3.12), we introduce the orthogonal projector onto the column space of Z T Y,Z and 
its orthogonal complement as 

71~Z,E = ZZ^, 7fz,E := Iq — 71~Z,£ (A. 11) 

Then we have the following Lemma: 

Lemma A.3. (a) For any positive definite D, Z E from (3.12) solves 

E ld \\X- AoP(ZX\Y)\\ 2 D = mini, A G W xq (A.12) 
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(b) Y}Z t tvz,s = 0; in particular, no matter of the rank of Z or irz,s, with K — Y,Z r C , 

Z Y 'ZK = K (A.13) 

Proof, (a) As in Lemma A.l, we see that 

A = EZ T K T Z T (ZKCov(Y)K T Z T y (A.14) 

Abbreviating ZT,Z T by B and Cov(Y) by C, this gives A = Y,Z T C~ B(BC~ B)~ , and with E. B the 
symmetric root of E, and with G = E. 5 Z T , this becomes A = T,,sGC~G T G(G T GC~G T G)~ . Next 
we pass to the singular value decomposition of G = USW T , with U, W corresponding orthogonal 
matrices in R pxp and M 9X9 , respectively, and S € W xq a matrix with the singular values on the 
"diagonal entries" Si,i, i = 1, . . . , min(p, q) and Sij = 0, i ^ j; furthermore, > for i — 1, . . . , d, 
d < min(p, q) and else. Using (aba T )~ — (a T )~ 1 b~ a~ x for a invertible and setting T = S T S, we 
obtain 

A = T,. 5 USW T C'WTW" r (WTW r C'WTW T y = E. 5 USW T C'WT(TW T C'WTyW" r 

As the expressions of the symmetric matrices W T C~W are surrounded by S (resp. T-)-terms, we 
may replace them with a matrix R G K 9X? with only entries in the upper d x d block, i.e., A — 
T,. 5 USRT(TRTyW T and as R now is compatible with S and T, 

A = -E. 5 USRl d R-T~W T = T,. 5 USRR~T-W T , for Id = TT~ 

Now, as C = WTW T + V, W r C~W = (T + W T VW)~ , in particular the upper d x d block i? d of 
R = l d (T + W T VWyi d is invertible and 

i = Z. 5 UST-W T (= T,. b US~W T ) = X. 5 USW T WT-W T = £Z T B~ = Z E 

(b) We start by noting that EZ T 7f z , E = Z. 5 USW T W(l q - l d )W T = 0. For (A.13), we write K = 
T.Z T {n z ^+nzs)C- = XZ T n z ,i;C- = Z^ZK. 

□ 

As a consequence of assertion (b) in the preceding Lemma, we obtain 

Corollary A. 4. No matter of the rank of Z or 7tz,e, 

6P(X\Y) = Z S (Y -oP(e\Y)) (A.15) 

that is, we can exactly recover oP(X\Y) from oP(e|Y), and passing over the reconstruction of ZX first 
does not cost us anything in efficiency compared to the direct route. 

Proof. We only note that Y - oP(e|Y) = oP{ZX\Y) = ZKY. □ 

To keep things well-defined in this setting where we have "invisible directions" in the state, we 
may recur to passing to a semi-norm in A-space which ignores such directions. A possible candidate 
for D in Lemma A.l is 

D~ = (Z^ZY^Z^Z (A.16) 

On the one hand, as we show below, invisible directions get ignored, on the other hand, by (A.13), no 
direction visible for the classically optimal procedure is lost. 

Proposition A. 5. Using D from (A.16) and assuming observation errors from the ideal situation, 
maximal MSE error for rLS.IO measured in this norm remains bounded for 10 contamination. With 
this norm, K is smallest possible solution to (A. 3) in Frobenius norm. 
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Proof. The error term e = X — X for the rLS.IO can be written as 

e = X - Z S (Y - H b (Y - ZKY)) = (I p - Z S Z)X - Z E (e - H b ((l q - ZJQY)) 

As (Z E Z) 2 = Z E Z, we see that (I p - Z S Z)(Z S Z) = 0, so that in D-semi-norm, the (I p - Z E Z)X 
terms cancel out and we get 

e T D~e = [Z^{e~H b (-))] T {Z^Zy^Z^Z[Z^{e-H b {-))] = 
= (e-H b (-)) T (Z s ) T (Z s Z) T E-Z s ZZ s (s-H b (-)) = 
= {e - H b {-)) T B~ {e - H b {-)) < 2e T B~ e + 2H b ( ■ ) T B~ H b ( • ) 

so MSE is bounded by 2tr(B~(V + b 2 I q )). The second assertion is an immediate consequence of 
Lemma A.l and Lemma A. 3(b). □ 

Note that changing the norm in the F-space is not necessary for boundedness reasons, as with only 
ideally distributed e, the reconstruction of ZX can be achieved such that no matter how largely AX 
is contaminated, the maximal MSE remains bounded. 
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